ROCm e HIP: Un tutorial dettagliato in 10 capitoli: La rotazione parallela: Mappare la logica sequenziale sui thread della GPU

La Rotazione parallela rappresenta il cambiamento fondamentale nella filosofia computazionale dal sequenza temporale (fare una cosa dopo l'altra) a un distribuzione spaziale (fare tutto contemporaneamente su una griglia).

1. L'heuristica dell'indipendenza

Questa è la regola d'oro del calcolo GPU: «Ogni volta che il tuo problema consiste nel 'applicare qualcosa in modo indipendente a N elementi', questa è la prima mappatura da provare». Questo approccio dati-parallelo è la mela caduta dall'albero nell'accelerazione GPU, dove il sovraccarico di gestione dei thread è insignificante rispetto al flusso massimo simultaneo.

2. Precisione e carico

I kernel HIP gestiscono tipicamente grandi array di tipi primitivi. Nel grafica ad alte prestazioni e nell'apprendimento automatico, di solito utilizziamo float (precisione semplice), mentre le simulazioni scientifiche che richiedono una stabilità numerica estrema utilizzano double (precisione doppia).

3. Dall'iterazione all'occupazione

Nel codice della CPU, il processore "visita" i dati tramite cicli. Nella logica GPU, i dati "occupano" un thread. Smetti di scrivere come fare il ciclo e inizia a scrivere ciò che un singolo lavoratore dovrebbe fare a un determinato coordinato.

$$\text{Indice } i = \text{blockIdx.x} \times \text{blockDim.x} + \text{threadIdx.x}$$

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary heuristic for deciding if a problem is suitable for the 'Parallel Pivot'?

The problem requires complex recursion.

The problem involves applying an operation independently to N elements.

The problem must be solved in a strict temporal order.

The problem uses only integer arithmetic.

QUESTION 2

In the context of the Parallel Pivot, what does the term 'Occupation' refer to?

The CPU visiting each index in a for-loop.

How many blocks are currently queued in the GPU.

Data 'occupying' a specific thread at a specific coordinate.

The percentage of memory used by the float arrays.

QUESTION 3

Which data types are most commonly handled by HIP kernels for high numerical stability in science?

bool and char

int and long

float and double

void and pointer

QUESTION 4

When pivoting a loop into a kernel, what replaces the loop counter `i`?

The return value of the function.

A global thread identity calculated from grid/block dimensions.

The hipMalloc address.

The host-side iteration variable.

QUESTION 5

Fill in the blank: To ensure production reliability even in basic kernels, you must ______.

Only use float types.

Add explicit error-checking macros everywhere.

Use a single thread per block.

Avoid all boundary checks.